AdaCB: An Adaptive Gradient Method with Convergence Range Bound of Learning Rate
نویسندگان
چکیده
Adaptive gradient descent methods such as Adam, RMSprop, and AdaGrad achieve great success in training deep learning models. These adaptively change the rates, resulting a faster convergence speed. Recent studies have shown their problems include extreme non-convergence issues, well poor generalization. Some enhanced variants been proposed, AMSGrad, AdaBound. However, performances of these alternatives are controversial some drawbacks still occur. In this work, we proposed an optimizer called AdaCB, which limits rates Adam range bound. The bound is determined by LR test, then two functions designed to constrain tend constant value. To evaluate our method, carry out experiments on image classification task, three models including Smallnet, Network IN Network, Resnet trained CIFAR10 CIFAR100 datasets. Experimental results show that method outperforms other optimizers datasets with accuracies (82.76%, 53.29%), (86.24%, 60.19%), (83.24%, 55.04%) Resnet, respectively. also indicate maintains speed, like adaptive methods, early stage achieves considerable accuracy, SGD (M), at end.
منابع مشابه
Convergence of Gradient Dynamics with a Variable Learning Rate
As multiagent environments become more prevalent we need to understand how this changes the agent-based paradigm. One aspect that is heavily affected by the presence of multiple agents is learning. Traditional learning algorithms have core assumptions, such as Markovian transitions, which are violated in these environments. Yet, understanding the behavior of learning algorithms in these domains...
متن کاملQuasi-Optimal Convergence Rate of an Adaptive Discontinuous Galerkin Method
We analyze an adaptive discontinuous finite element method (ADFEM) for symmetric second order linear elliptic operators. The method is formulated on nonconforming meshes made of simplices or quadrilaterals, with any polynomial degree and in any dimension ≥ 2. We prove that the ADFEM is a contraction for the sum of the energy error and the scaled error estimator, between two consecutive adaptive...
متن کاملADADELTA: An Adaptive Learning Rate Method
We present a novel per-dimension learning rate method for gradient descent called ADADELTA. The method dynamically adapts over time using only first order information and has minimal computational overhead beyond vanilla stochastic gradient descent. The method requires no manual tuning of a learning rate and appears robust to noisy gradient information, different model architecture choices, var...
متن کاملQuasi-optimal Convergence Rate for an Adaptive Boundary Element Method
For the simple layer potential V that is associated with the 3D Laplacian, we consider the weakly singular integral equation V φ = f . This equation is discretized by the lowest order Galerkin boundary element method. We prove convergence of an h-adaptive algorithm that is driven by a weighted residual error estimator. Moreover, we identify the approximation class for which the adaptive algorit...
متن کاملQuasi-Optimal Convergence Rate for an Adaptive Finite Element Method
We analyze the simplest and most standard adaptive finite element method (AFEM), with any polynomial degree, for general second order linear, symmetric elliptic operators. As is customary in practice, the AFEM marks exclusively according to the error estimator and performs a minimal element refinement without the interior node property. We prove that the AFEM is a contraction, for the sum of th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Applied sciences
سال: 2022
ISSN: ['2076-3417']
DOI: https://doi.org/10.3390/app12189389